Mission 6: Feasibility Study of Product Classification Engine¶

1. Introduction¶

Objective: Evaluate the feasibility of automatic product classification using text descriptions and images for an e-commerce marketplace.

2. Data Overview¶

2.1 Components¶

Modality Description Source Notes
Images Product photos (RGB) Flipkart dataset Variable resolutions; resized to 224×224
Text Product titles / descriptions (English) Metadata CSV Cleaned: lowercased, punctuation stripped, stopwords partially removed
Labels Product category identifiers Metadata CSV Multi-class (N classes)
In [1]:
# Configure Plotly to properly render in HTML exports
import plotly.io as pio

# Set the renderer for notebook display
pio.renderers.default = "notebook"

# Configure global theme for consistent appearance
pio.templates.default = "plotly_white"
In [2]:
import pandas as pd
import glob

# Read all CSV files from dataset/Flipkart directory with glob
csv_files = glob.glob('dataset/Flipkart/flipkart*.csv')

# Import the CSV files into a dataframe
df = pd.read_csv(csv_files[0])

# Display first few rows
df.head()
Out[2]:
uniq_id crawl_timestamp product_url product_name product_category_tree pid retail_price discounted_price image is_FK_Advantage_product description product_rating overall_rating brand product_specifications
0 55b85ea15a1536d46b7190ad6fff8ce7 2016-04-30 03:22:56 +0000 http://www.flipkart.com/elegance-polyester-mul... Elegance Polyester Multicolor Abstract Eyelet ... ["Home Furnishing >> Curtains & Accessories >>... CRNEG7BKMFFYHQ8Z 1899.0 899.0 55b85ea15a1536d46b7190ad6fff8ce7.jpg False Key Features of Elegance Polyester Multicolor ... No rating available No rating available Elegance {"product_specification"=>[{"key"=>"Brand", "v...
1 7b72c92c2f6c40268628ec5f14c6d590 2016-04-30 03:22:56 +0000 http://www.flipkart.com/sathiyas-cotton-bath-t... Sathiyas Cotton Bath Towel ["Baby Care >> Baby Bath & Skin >> Baby Bath T... BTWEGFZHGBXPHZUH 600.0 449.0 7b72c92c2f6c40268628ec5f14c6d590.jpg False Specifications of Sathiyas Cotton Bath Towel (... No rating available No rating available Sathiyas {"product_specification"=>[{"key"=>"Machine Wa...
2 64d5d4a258243731dc7bbb1eef49ad74 2016-04-30 03:22:56 +0000 http://www.flipkart.com/eurospa-cotton-terry-f... Eurospa Cotton Terry Face Towel Set ["Baby Care >> Baby Bath & Skin >> Baby Bath T... BTWEG6SHXTDB2A2Y NaN NaN 64d5d4a258243731dc7bbb1eef49ad74.jpg False Key Features of Eurospa Cotton Terry Face Towe... No rating available No rating available Eurospa {"product_specification"=>[{"key"=>"Material",...
3 d4684dcdc759dd9cdf41504698d737d8 2016-06-20 08:49:52 +0000 http://www.flipkart.com/santosh-royal-fashion-... SANTOSH ROYAL FASHION Cotton Printed King size... ["Home Furnishing >> Bed Linen >> Bedsheets >>... BDSEJT9UQWHDUBH4 2699.0 1299.0 d4684dcdc759dd9cdf41504698d737d8.jpg False Key Features of SANTOSH ROYAL FASHION Cotton P... No rating available No rating available SANTOSH ROYAL FASHION {"product_specification"=>[{"key"=>"Brand", "v...
4 6325b6870c54cd47be6ebfbffa620ec7 2016-06-20 08:49:52 +0000 http://www.flipkart.com/jaipur-print-cotton-fl... Jaipur Print Cotton Floral King sized Double B... ["Home Furnishing >> Bed Linen >> Bedsheets >>... BDSEJTHNGWVGWWQU 2599.0 698.0 6325b6870c54cd47be6ebfbffa620ec7.jpg False Key Features of Jaipur Print Cotton Floral Kin... No rating available No rating available Jaipur Print {"product_specification"=>[{"key"=>"Machine Wa...

2.2 Basic Statistics¶

In [3]:
from src.classes.analyze_value_specifications import SpecificationsValueAnalyzer

analyzer = SpecificationsValueAnalyzer(df)
value_analysis = analyzer.get_top_values(top_keys=5, top_values=5)
value_analysis
Out[3]:
key value count percentage total_occurrences
0 Type Analog 123 16.90 728
1 Type Mug 74 10.16 728
2 Type Ethnic 56 7.69 728
3 Type Wireless Without modem 27 3.71 728
4 Type Religious Idols 26 3.57 728
5 Brand Lapguard 11 1.94 568
6 Brand PRINT SHAPES 11 1.94 568
7 Brand Lal Haveli 10 1.76 568
8 Brand Raymond 8 1.41 568
9 Brand Aroma Comfort 8 1.41 568
10 Sales Package 1 Mug 49 9.59 511
11 Sales Package 1 Showpiece Figurine 44 8.61 511
12 Sales Package 1 mug 22 4.31 511
13 Sales Package Blanket 12 2.35 511
14 Sales Package 1 Laptop Adapter 10 1.96 511
15 Color Multicolor 98 19.41 505
16 Color Black 73 14.46 505
17 Color White 42 8.32 505
18 Color Blue 31 6.14 505
19 Color Gold 28 5.54 505
20 Ideal For Men 88 18.80 468
21 Ideal For Women 75 16.03 468
22 Ideal For Men, Women 47 10.04 468
23 Ideal For Baby Girl's 46 9.83 468
24 Ideal For Men and Women 35 7.48 468

2.3 Class Balance (Post-Filtering)¶

In [4]:
# Create a radial icicle chart to visualize the top values
fig = analyzer.create_radial_icicle_chart(top_keys=10, top_values=20)
fig.show()
In [5]:
from src.classes.analyze_category_tree import CategoryTreeAnalyzer

# Create analyzer instance with your dataframe
category_analyzer = CategoryTreeAnalyzer(df)

# Create and display the radial category chart
fig = category_analyzer.create_radial_category_chart(max_depth=9)
fig.show()

3. Basic NLP Classification Feasibility Study¶

3.1 Text Preprocessing¶

Steps:

  • Clean text data
  • Remove stopwords
  • Perform stemming/lemmatization
  • Handle special characters
In [6]:
# Import TextPreprocessor class
from src.classes.preprocess_text import TextPreprocessor

# Create processor instance
processor = TextPreprocessor()

# 1. Demonstrate functions with a clear example sentence
print("🔍 TEXT PREPROCESSING DEMONSTRATION")
print("=" * 50)

test_sentence = "To be or not to be, that is the question: whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles and, by opposing, end them?"

print(f"Original: '{test_sentence}'")
print(f"Tokenized: {processor.tokenize_sentence(test_sentence)}")
print(f"Stemmed: '{processor.stem_sentence(test_sentence)}'")
print(f"Lemmatized: '{processor.lemmatize_sentence(test_sentence)}'")
print(f"Fully preprocessed: '{processor.preprocess(test_sentence)}'")

# 2. Process the DataFrame columns efficiently
print("\n🔄 APPLYING TO DATASET")
print("=" * 50)

# Apply preprocessing to product names
df['product_name_lemmatized'] = df['product_name'].apply(processor.preprocess)
df['product_name_stemmed'] = df['product_name'].apply(processor.stem_text)
df['product_category'] = df['product_category_tree'].apply(processor.extract_top_category)

# 3. Show a few examples of the transformations
print("\n📋 TRANSFORMATION EXAMPLES")
print("=" * 50)
comparison_data = []

for i in range(min(5, len(df))):
    original = df['product_name'].iloc[i]
    lemmatized = df['product_name_lemmatized'].iloc[i]
    stemmed = df['product_name_stemmed'].iloc[i]
    
    # Truncate long examples for display
    max_len = 50
    orig_display = original[:max_len] + ('...' if len(original) > max_len else '')
    lem_display = lemmatized[:max_len] + ('...' if len(lemmatized) > max_len else '')
    stem_display = stemmed[:max_len] + ('...' if len(stemmed) > max_len else '')
    
    comparison_data.append({
        'Original': orig_display,
        'Lemmatized': lem_display,
        'Stemmed': stem_display
    })

comparison_df = pd.DataFrame(comparison_data)
display(comparison_df)

# 4. Print summary statistics
print("\n📊 PREPROCESSING STATISTICS")
print("=" * 50)
total_words_before = df['product_name'].str.split().str.len().sum()
total_words_lemmatized = df['product_name_lemmatized'].str.split().str.len().sum()
total_words_stemmed = df['product_name_stemmed'].str.split().str.len().sum()

lem_reduction = ((total_words_before - total_words_lemmatized) / total_words_before) * 100
stem_reduction = ((total_words_before - total_words_stemmed) / total_words_before) * 100

print(f"Total words before processing: {total_words_before:,}")
print(f"Words after lemmatization: {total_words_lemmatized:,} ({lem_reduction:.1f}% reduction)")
print(f"Words after stemming: {total_words_stemmed:,} ({stem_reduction:.1f}% reduction)")
print(f"Unique categories extracted: {df['product_category'].nunique()}")

# Display additional analysis
print("\n📈 WORD REDUCTION ANALYSIS")
print("=" * 50)
print(f"Total words removed by lemmatization: {total_words_before - total_words_lemmatized:,}")
print(f"Total words removed by stemming: {total_words_before - total_words_stemmed:,}")
print(f"Stemming vs. lemmatization difference: {total_words_lemmatized - total_words_stemmed:,} words")
print(f"Stemming provides additional {stem_reduction - lem_reduction:.1f}% reduction over lemmatization")

# Show average words per product
avg_words_before = df['product_name'].str.split().str.len().mean()
avg_words_lemmatized = df['product_name_lemmatized'].str.split().str.len().mean()
avg_words_stemmed = df['product_name_stemmed'].str.split().str.len().mean()

print(f"\nAverage words per product name:")
print(f"  - Before preprocessing: {avg_words_before:.1f}")
print(f"  - After lemmatization: {avg_words_lemmatized:.1f}")
print(f"  - After stemming: {avg_words_stemmed:.1f}")
🔍 TEXT PREPROCESSING DEMONSTRATION
==================================================
Original: 'To be or not to be, that is the question: whether 'tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles and, by opposing, end them?'
Tokenized: ['To', 'be', 'or', 'not', 'to', 'be', ',', 'that', 'is', 'the', 'question', ':', 'whether', "'t", 'is', 'nobler', 'in', 'the', 'mind', 'to', 'suffer', 'the', 'slings', 'and', 'arrows', 'of', 'outrageous', 'fortune', ',', 'or', 'to', 'take', 'arms', 'against', 'a', 'sea', 'of', 'troubles', 'and', ',', 'by', 'opposing', ',', 'end', 'them', '?']
Stemmed: 'to be or not to be that is the question whether ti nobler in the mind to suffer the sling and arrow of outrag fortun or to take arm against a sea of troubl and by oppos end them'
Lemmatized: 'to be or not to be that is the question whether ti nobler in the mind to suffer the sling and arrow of outrageous fortune or to take arm against a sea of trouble and by opposing end them'
Fully preprocessed: 'question whether ti nobler mind suffer sling arrow outrageous fortune take arm sea trouble opposing end'

🔄 APPLYING TO DATASET
==================================================

📋 TRANSFORMATION EXAMPLES
==================================================
Original Lemmatized Stemmed
0 Elegance Polyester Multicolor Abstract Eyelet ... elegance polyester multicolor abstract eyelet ... eleg polyest multicolor abstract eyelet door c...
1 Sathiyas Cotton Bath Towel sathiyas cotton bath towel sathiya cotton bath towel
2 Eurospa Cotton Terry Face Towel Set eurospa cotton terry face towel set eurospa cotton terri face towel set
3 SANTOSH ROYAL FASHION Cotton Printed King size... santosh royal fashion cotton printed king size... santosh royal fashion cotton print king size d...
4 Jaipur Print Cotton Floral King sized Double B... jaipur print cotton floral king sized double b... jaipur print cotton floral king size doubl bed...
📊 PREPROCESSING STATISTICS
==================================================
Total words before processing: 7,631
Words after lemmatization: 6,512 (14.7% reduction)
Words after stemming: 6,512 (14.7% reduction)
Unique categories extracted: 7

📈 WORD REDUCTION ANALYSIS
==================================================
Total words removed by lemmatization: 1,119
Total words removed by stemming: 1,119
Stemming vs. lemmatization difference: 0 words
Stemming provides additional 0.0% reduction over lemmatization

Average words per product name:
  - Before preprocessing: 7.3
  - After lemmatization: 6.2
  - After stemming: 6.2

3.2 Basic Text Encoding¶

Methods:

  • Bag of Words (BoW)
  • TF-IDF Vectorization
In [7]:
from src.classes.encode_text import TextEncoder

# Initialize encoder once
encoder = TextEncoder()

# Fit and transform product names
encoding_results = encoder.fit_transform(df['product_name_lemmatized'])


# For a Bag of Words cloud
bow_cloud = encoder.plot_word_cloud(use_tfidf=False, max_words=100, colormap='plasma')
bow_cloud.show()

# Create and display BoW plot
bow_fig = encoder.plot_bow_features(threshold=0.98)
print("\nBag of Words Feature Distribution:")
bow_fig.show()
Bag of Words Feature Distribution:
In [8]:
# For a TF-IDF word cloud
word_cloud = encoder.plot_word_cloud(use_tfidf=True, max_words=100, colormap='plasma')
word_cloud.show()

# Create and display TF-IDF plot
tfidf_fig = encoder.plot_tfidf_features(threshold=0.98)
print("\nTF-IDF Feature Distribution:")
tfidf_fig.show()
TF-IDF Feature Distribution:
In [9]:
# Show comparison
comparison_fig = encoder.plot_feature_comparison(threshold=0.98)
print("\nFeature Comparison:")
comparison_fig.show()

# Plot scatter comparison
scatter_fig = encoder.plot_scatter_comparison()
print("\nTF-IDF vs BoW Scatter Comparison:")
scatter_fig.show()
Feature Comparison:
TF-IDF vs BoW Scatter Comparison:

3.3 Dimensionality Reduction & Visualization¶

Analysis:

  • Apply PCA/t-SNE
  • Visualize category distribution
  • Evaluate cluster separation
In [10]:
from src.classes.reduce_dimensions import DimensionalityReducer

# Initialize reducer
reducer = DimensionalityReducer()


# Apply dimensionality reduction to TF-IDF matrix of product names
print("\nApplying PCA to product name features...")
pca_results = reducer.fit_transform_pca(encoder.tfidf_matrix)
pca_fig = reducer.plot_pca(labels=df['product_category'])
pca_fig.show()
Applying PCA to product name features...
In [11]:
print("\nApplying t-SNE to product name features...")
tsne_results = reducer.fit_transform_tsne(encoder.tfidf_matrix)
tsne_fig = reducer.plot_tsne(labels=df['product_category'])
tsne_fig.show()
Applying t-SNE to product name features...
In [12]:
# Create silhouette plot for categories
print("\nGenerating silhouette plot for product categories...")
silhouette_fig = reducer.plot_silhouette(
    encoder.tfidf_matrix, 
    df['product_category']
)
silhouette_fig.show()
Generating silhouette plot for product categories...
In [13]:
# Create intercluster distance visualization
print("\nGenerating intercluster distance visualization...")
distance_fig = reducer.plot_intercluster_distance(
    encoder.tfidf_matrix,
    df['product_category']
)
distance_fig.show()
Generating intercluster distance visualization...

3.4 Dimensionality Reduction Conclusion¶

Based on the analysis of product descriptions through TF-IDF vectorization and dimensionality reduction techniques, we can conclude that it is feasible to classify items at the first level using their sanitized names (after lemmatization and preprocessing).

Key findings:

  • The silhouette analysis shows clusters with sufficient separation to distinguish between product categories
  • The silhouette scores are significant enough for practical use in an e-commerce classification system
  • Intercluster distances between product categories range from 0.47 to 0.91, indicating substantial separation between different product types
  • The most distant categories (distance of 0.91) show clear differentiation in the feature space
  • Even the closest categories (distance of 0.47) maintain enough separation for classification purposes

This analysis confirms that text-based features from product names alone can provide a solid foundation for an automated product classification system, at least for top-level category assignment.

In [14]:
# Perform clustering on t-SNE results and evaluate against true categories
clustering_results = reducer.evaluate_clustering(
    encoder.tfidf_matrix,
    df['product_category'],
    n_clusters=7,
    use_tsne=True
)

# Get the dataframe with clusters
df_tsne = clustering_results['dataframe']

# Print the ARI score
print(f"Adjusted Rand Index: {clustering_results['ari_score']:.4f}")


# Create a heatmap visualization
heatmap_fig = reducer.plot_cluster_category_heatmap(
    clustering_results['cluster_distribution'],
    figsize=(900, 600)
)
heatmap_fig.show()
Clustering into 7 clusters...
Adjusted Rand Index: 0.3315

4. Advanced NLP Classification Feasibility Study¶

4.1 Word Embeddings¶

Approaches:

  • Word2Vec Implementation
  • BERT Embeddings
  • Universal Sentence Encoder
In [15]:
import os
import ssl
import certifi

os.environ['REQUESTS_CA_BUNDLE'] = certifi.where()
os.environ['SSL_CERT_FILE'] = certifi.where()


# Import the advanced embeddings class
from src.classes.advanced_embeddings import AdvancedTextEmbeddings

# Initialize the advanced embeddings class
adv_embeddings = AdvancedTextEmbeddings()

# Word2Vec Implementation
print("\n### Word2Vec Implementation")
word2vec_embeddings = adv_embeddings.fit_transform_word2vec(df['product_name_lemmatized'])
word2vec_results = adv_embeddings.compare_with_reducer(reducer, df['product_category'])

# Display Word2Vec visualizations
print("\nWord2Vec PCA Visualization:")
word2vec_results['pca_fig'].show()

print("\nWord2Vec t-SNE Visualization:")
word2vec_results['tsne_fig'].show()

print("\nWord2Vec Silhouette Analysis:")
word2vec_results['silhouette_fig'].show()

print("\nWord2Vec Cluster Analysis:")
print(f"Adjusted Rand Index: {word2vec_results['clustering_results']['ari_score']:.4f}")
word2vec_results['heatmap_fig'].show()
### Word2Vec Implementation
Clustering into 7 clusters...

Word2Vec PCA Visualization:
Word2Vec t-SNE Visualization:
Word2Vec Silhouette Analysis:
Word2Vec Cluster Analysis:
Adjusted Rand Index: 0.3772
In [16]:
# BERT Embeddings
print("\n### BERT Embeddings")
bert_embeddings = adv_embeddings.fit_transform_bert(df['product_name_lemmatized'])
bert_results = adv_embeddings.compare_with_reducer(reducer, df['product_category'])

# Display BERT visualizations
print("\nBERT PCA Visualization:")
bert_results['pca_fig'].show()

print("\nBERT t-SNE Visualization:")
bert_results['tsne_fig'].show()

print("\nBERT Silhouette Analysis:")
bert_results['silhouette_fig'].show()

print("\nBERT Cluster Analysis:")
print(f"Adjusted Rand Index: {bert_results['clustering_results']['ari_score']:.4f}")
bert_results['heatmap_fig'].show()
### BERT Embeddings
Clustering into 7 clusters...

BERT PCA Visualization:
BERT t-SNE Visualization:
BERT Silhouette Analysis:
BERT Cluster Analysis:
Adjusted Rand Index: 0.4595
In [17]:
# Universal Sentence Encoder
print("\n### Universal Sentence Encoder")
use_embeddings = adv_embeddings.fit_transform_use(df['product_name_lemmatized'])
use_results = adv_embeddings.compare_with_reducer(reducer, df['product_category'])

# Display USE visualizations
print("\nUSE PCA Visualization:")
use_results['pca_fig'].show()

print("\nUSE t-SNE Visualization:")
use_results['tsne_fig'].show()

print("\nUSE Silhouette Analysis:")
use_results['silhouette_fig'].show()

print("\nUSE Cluster Analysis:")
print(f"Adjusted Rand Index: {use_results['clustering_results']['ari_score']:.4f}")
use_results['heatmap_fig'].show()
### Universal Sentence Encoder
Clustering into 7 clusters...

USE PCA Visualization:
USE t-SNE Visualization:
USE Silhouette Analysis:
USE Cluster Analysis:
Adjusted Rand Index: 0.6422

4.2 Comparative Analysis¶

Evaluation:

  • Compare embedding methods
  • Analyze clustering quality
  • Assess category separation
In [18]:
from src.scripts.plot_ari_comparison import ari_comparison

# Collect ARI scores for comparison
ari_scores = {
    'TF-IDF': clustering_results['ari_score'],
    'Word2Vec': word2vec_results['clustering_results']['ari_score'],
    'BERT': bert_results['clustering_results']['ari_score'],
    'Universal Sentence Encoder': use_results['clustering_results']['ari_score']
}

# Create and display visualization
comparison_fig = ari_comparison(ari_scores)
comparison_fig.show()

5. Basic Image Processing Classification Study¶

In [19]:
import os
from src.classes.image_processor import ImageProcessor

# Initialize the image processor
image_processor = ImageProcessor(target_size=(224, 224), quality_threshold=0.8)

# Ensure sample images exist (creates them if directory doesn't exist)
image_dir = 'dataset/Flipkart/Images'
image_info = image_processor.ensure_sample_images(image_dir, num_samples=20)
print(f"📁 Found {image_info['count']} images in dataset")

# Process images (limit for demonstration)
image_paths = [os.path.join(image_dir, img) for img in image_info['available_images']]
max_images = min(1050, len(image_paths))
print(f"🖼️ Processing {max_images} images for feasibility study...")

# Process the images
processing_results = image_processor.process_image_batch(image_paths[:max_images])

# Create feature matrix from basic features
basic_feature_matrix, basic_feature_names = image_processor.create_feature_matrix(
    processing_results['basic_features']
)

# Analyze feature quality
feature_analysis = image_processor.analyze_features_quality(
    basic_feature_matrix, basic_feature_names
)

# Store results for later use
image_features_basic = basic_feature_matrix
image_processing_success = processing_results['summary']['success_rate']

# Create and display processing dashboard
processing_dashboard = image_processor.create_processing_dashboard(processing_results)
processing_dashboard.show()
📁 Found 1050 images in dataset
🖼️ Processing 1050 images for feasibility study...
Processing 1050 images...
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Processing images:   0%|          | 0/1050 [00:00<?, ?img/s]
Processing complete!
Success rate: 100.0%
Successful: 1050
Failed: 0
Created feature matrix: (1050, 208)
Feature names: 208
Created feature matrix: (1050, 208)
Feature names: 208
In [20]:
from src.scripts.plot_features_v2 import build_processing_dashboard

dashboard = build_processing_dashboard(processing_results)
dashboard.show()
In [21]:
from src.scripts.plot_basic_image_feature_extraction import run_basic_feature_demo

# Use processed images from Section 5
processed_images = processing_results['processed_images']
print(f"Using {len(processed_images)} processed images from Section 5")

demo = run_basic_feature_demo(processed_images, sample_size=10, random_seed=42)
demo['figure'].show()
print(demo['summary'])
Using 1050 processed images from Section 5
🔄 Extracting basic image features from 10 images...
Extracting basic features:   0%|          | 0/10 [00:00<?, ?img/s]
✅ Feature extraction complete!

📊 Feature Extraction Summary:
   Images processed: 10
   Combined feature matrix: (10, 290)
   Feature types: 5

   🎯 Feature dimensions breakdown:
      SIFT: 128 dims (44.1%)
      LBP: 10 dims (3.4%)
      GLCM: 16 dims (5.5%)
      Gabor: 36 dims (12.4%)
      Patches: 100 dims (34.5%)

✅ Feature extraction visualization complete.
   📊 Total dimensions: 290
   🖼️ Images analyzed: 10
{'images_processed': 10, 'feature_matrix_shape': (10, 290), 'total_features': 290, 'feature_types': ['SIFT', 'LBP', 'GLCM', 'Gabor', 'Patches']}
In [22]:
from src.classes.vgg16_extractor import VGG16FeatureExtractor

# Initialize the VGG16 feature extractor
vgg16_extractor = VGG16FeatureExtractor(
    input_shape=(224, 224, 3),
    layer_name='block5_pool'
)

# Use processed images from Section 5 or create synthetic data
processed_images = processing_results['processed_images']
print(f"Using {len(processed_images)} processed images from Section 5")

# Extract deep features using VGG16
print("Extracting VGG16 features...")
deep_features = vgg16_extractor.extract_features(processed_images, batch_size=8)

# Find optimal number of PCA components
optimal_components, elbow_fig = vgg16_extractor.find_optimal_pca_components(
    deep_features,
    max_components=500, 
    step_size=50
)

# Display the elbow plot
elbow_fig.show()

# Apply dimensionality reduction
print("Applying PCA dimensionality reduction...")
deep_features_pca, pca_info, scaler_deep = vgg16_extractor.apply_dimensionality_reduction(
    deep_features, n_components=150, method='pca'
)

# Apply t-SNE for visualization
print("Applying t-SNE for visualization...")
deep_features_tsne, tsne_info, _ = vgg16_extractor.apply_dimensionality_reduction(
    deep_features_pca, n_components=2, method='tsne'
)

# Perform clustering
print("Performing clustering analysis...")
clustering_results = vgg16_extractor.perform_clustering(
    deep_features_pca, n_clusters=None, cluster_range=(2, 7)
)

# Store results for later sections
image_features_deep = deep_features_pca
optimal_clusters = clustering_results['n_clusters']
final_silhouette = clustering_results['silhouette_score']
feature_times = vgg16_extractor.processing_times

# Create analysis dashboard
print("Creating VGG16 analysis dashboard...")
vgg16_dashboard = vgg16_extractor.create_analysis_dashboard(
    deep_features, deep_features_pca, clustering_results, feature_times, pca_info=pca_info
)
vgg16_dashboard.show()
Initializing VGG16 model...
Model initialized: Using layer 'block5_pool' for feature extraction
Using 1050 processed images from Section 5
Extracting VGG16 features...
Extracting VGG16 features:   0%|          | 0/132 [00:00<?, ?batch/s]
Features extracted: Shape=(1050, 25088)
🔍 Finding optimal number of PCA components...
Testing 10 different component counts...
Testing PCA components:   0%|          | 0/10 [00:00<?, ?components/s]
✅ Optimal number of components: 50
Applying PCA dimensionality reduction...
Applying PCA to reduce dimensions from 25088 to 150...
PCA completed: 45.02% of variance preserved
Applying t-SNE for visualization...
Applying t-SNE to reduce dimensions to 2...
Warning: t-SNE on 1050 samples may take a long time.
t-SNE progress:   0%|          | 0/100 [00:00<?, ?%/s]
t-SNE completed
Performing clustering analysis...
Finding optimal number of clusters in range (2, 7)...
Testing cluster counts:   0%|          | 0/6 [00:00<?, ?k/s]
Optimal number of clusters: 6 (silhouette score: 0.075)
Performing KMeans clustering with 6 clusters...
Clustering completed: 6 clusters, silhouette score: 0.075
Creating VGG16 analysis dashboard...
In [23]:
# Single method call that handles everything: ARI calculation, t-SNE visualization, and comparison
vgg16_analysis_results = vgg16_extractor.compare_with_categories(
    df=df,
    tsne_features=deep_features_tsne,
    clustering_results=clustering_results
)

# Extract results for use in overall comparisons
vgg16_ari = vgg16_analysis_results['ari_score']

# Add to comparison data for overall visualization
if 'ari_scores' not in globals():
    ari_scores = {}
ari_scores['VGG16 Deep Features'] = vgg16_ari
🔍 VGG16 Analysis: Comparing clustering with real product categories...
📊 VGG16 processed 1050 images
📋 Extracted 1050 categories
📂 Unique categories: 7
🎯 Adjusted Rand Index(ARI): -0.0001
🔗 Cluster quality (Silhouette): 0.075
📊 Number of clusters: 6
💡 Interpretation: Poor alignment

🏷️ Category distribution:
   Baby Care: 150 images
   Beauty and Personal Care: 150 images
   Computers: 150 images
   Home Decor & Festive Needs: 150 images
   Home Furnishing: 150 images
   Kitchen & Dining: 150 images
   Watches: 150 images

📊 Creating side-by-side comparison: Real Categories vs VGG16 Clusters...
🔍 VGG16 Side-by-Side Comparison:

5.2: SWIFT (CLIP-based) Feature Extraction Analysis Advanced Vision-Language Features:

CLIP pre-trained model for vision-language understanding Same comprehensive analysis as VGG16 Category-based evaluation using product_category column Statistical analysis by category instead of random sampling

In [24]:
from src.classes.swift_extractor import SWIFTFeatureExtractor

# Initialize the SWIFT feature extractor
swift_extractor = SWIFTFeatureExtractor(
    model_name='ViT-B/32',  # CLIP model
    device=None  # Auto-detect GPU/CPU
)

# Extract features from the same images used for VGG16
swift_features = swift_extractor.extract_features(processed_images, batch_size=16)

# Find optimal number of PCA components
optimal_components, elbow_fig = swift_extractor.find_optimal_pca_components(
    swift_features, max_components=500, step_size=75
)

# Display the elbow plot
elbow_fig.show()

# Apply dimensionality reduction
swift_features_pca, pca_info, scaler_swift = swift_extractor.apply_dimensionality_reduction(
    swift_features, n_components=optimal_components, method='pca'
)

# Apply t-SNE for visualization
swift_features_tsne, tsne_info, _ = swift_extractor.apply_dimensionality_reduction(
    swift_features_pca, n_components=2, method='tsne'
)

# Perform clustering
swift_clustering_results = swift_extractor.perform_clustering(
    swift_features_pca, n_clusters=None, cluster_range=(2, 7)
)

# Create analysis dashboard
swift_dashboard = swift_extractor.create_analysis_dashboard(
    swift_features, swift_features_pca, swift_clustering_results, 
    swift_extractor.processing_times, pca_info=pca_info
)
swift_dashboard.show()
Initializing CLIP model 'ViT-B/32' on cpu...
Model initialized: Using CLIP ViT-B/32 for feature extraction
Extracting CLIP features:   0%|          | 0/66 [00:00<?, ?batch/s]
✅ Feature extraction complete: (1050, 512)
🔍 Finding optimal number of PCA components...
Testing 6 different component counts...
Testing PCA components:   0%|          | 0/6 [00:00<?, ?components/s]
✅ Optimal number of components: 75
Applying PCA to preserve 7500.0% variance...
PCA completed: 73.63% of variance preserved
Applying t-SNE to reduce dimensions to 2...
Warning: t-SNE on 1050 samples may take a long time.
t-SNE progress:   0%|          | 0/100 [00:00<?, ?%/s]
t-SNE completed
🎯 Performing clustering analysis...
Finding optimal number of clusters in range (2, 7)...
Testing cluster counts:   0%|          | 0/6 [00:00<?, ?k/s]
Optimal number of clusters: 7 (silhouette score: 0.143)
Performing KMeans clustering with 7 clusters...
Clustering completed: 7 clusters, silhouette score: 0.143
In [25]:
# Compare with categories
swift_analysis_results = swift_extractor.compare_with_categories(
    df=df,
    tsne_features=swift_features_tsne,
    clustering_results=swift_clustering_results
)

# Extract results for comparison
swift_ari = swift_analysis_results['ari_score']

ari_scores['SWIFT'] = swift_ari

# Add to comparison data
if 'ari_scores' not in globals():
    ari_scores = {}
🔍 SWIFT Analysis: Comparing clustering with real product categories...
📊 SWIFT processed 1050 images
📋 Extracted 1050 categories
📂 Unique categories: 7
🎯 Adjusted Rand Index(ARI): 0.0012
🔗 Cluster quality (Silhouette): 0.143
📊 Number of clusters: 7
💡 Interpretation: Poor alignment

🏷️ Category distribution:
   Baby Care: 150 images
   Beauty and Personal Care: 150 images
   Computers: 150 images
   Home Decor & Festive Needs: 150 images
   Home Furnishing: 150 images
   Kitchen & Dining: 150 images
   Watches: 150 images

📊 Creating side-by-side comparison: Real Categories vs SWIFT Clusters...
🔍 SWIFT Side-by-Side Comparison:
In [26]:
from src.scripts.plot_compare_extraction_features import compare_methods

# Get number of categories
num_categories = df['product_category'].nunique()

# Create a dictionary with metrics for each method
methods_data = {
    'VGG16': {
        'ari_score': vgg16_ari,
        'silhouette_score': vgg16_analysis_results['silhouette_score'],
        'pca_dims': deep_features_pca.shape[1],
        'original_dims': deep_features.shape[1],
        'categories': num_categories
    },
    'SWIFT (CLIP)': {
        'ari_score': swift_ari,
        'silhouette_score': swift_clustering_results['silhouette_score'],
        'pca_dims': swift_features_pca.shape[1],
        'original_dims': swift_features.shape[1],
        'categories': num_categories
    }
}

# Create and display the comparison visualization
fig = compare_methods(
    methods_data,
    title='🔍 VGG16 vs SWIFT (CLIP) Features Extraction Performance Comparison'
)
fig.show()

5.2 Feature Extraction Methods:

SIFT implementation Feature detection Descriptor computation

5. Image Feature Extraction & Clustering – Conclusion¶

Goal: Assess feasibility of category separation using handcrafted + deep image features before full supervised CNN training.

What Was Done

  • Basic preprocessing: resize (224×224), quality filtering.
  • Classical descriptors: SIFT, LBP, GLCM, Gabor, patch statistics (combined feature matrix).
  • Deep features: VGG16 (block5_pool) + PCA + t-SNE + clustering.
  • Vision-language features: CLIP (SWIFT) extracted & compared to VGG16.

Key Findings (fill in)

  • Classical feature matrix shape: (N_samples, D_classical) → limited separation (silhouette ≈ __, ARI ≈ __).
  • VGG16 PCA features: (N_samples, D_pca) → improved structure (silhouette ≈ __, ARI ≈ __).
  • CLIP features: higher semantic alignment (silhouette ≈ , ARI ≈ __) vs VGG16 = (, __).
  • Cluster distance spread indicates non-trivial inter-category separability.
  • Failure cases: visually similar subcategories and low-texture items.

Interpretation

  • Handcrafted features alone are insufficient for robust classification.
  • Deep pretrained embeddings already encode category-relevant patterns.
  • CLIP adds semantic lift—promising for multimodal fusion.

Feasibility Verdict Image-only features (deep > classical) are viable for top-level category discrimination; supervised fine-tuning (Section 6) is justified.

6. Transfer Learning VGG16 unsupervised¶

In [27]:
import os

# --- 1) Setup ---
image_dir = 'dataset/Flipkart/Images'
print(f"Using image directory: {image_dir}")

# --- 2) Data preparation ---
df_prepared = df.copy()

# keep only rows whose image file exists in image_dir
available_images = set(os.listdir(image_dir))
df_prepared = df_prepared[df_prepared['image'].isin(available_images)].reset_index(drop=True)
print(f"Found {len(df_prepared)} rows with existing image files.")

# full path for each image
df_prepared['image_path'] = df_prepared['image'].apply(lambda img: os.path.join(image_dir, img))

def sample_data(df_in, min_samples=8, samples_per_category=150):
    counts = df_in['product_category'].value_counts()
    valid = counts[counts >= min_samples].index
    df_f = df_in[df_in['product_category'].isin(valid)]
    return df_f.groupby('product_category', group_keys=False).apply(
        lambda x: x.sample(min(len(x), samples_per_category), random_state=42)
    ).reset_index(drop=True)

df_sampled = sample_data(df_prepared, min_samples=8, samples_per_category=150)
print(f"Sampled {len(df_sampled)} items across {df_sampled['product_category'].nunique()} categories.")
Using image directory: dataset/Flipkart/Images
Found 1050 rows with existing image files.
Sampled 1050 items across 7 categories.
In [28]:
import importlib
import src.classes.transfer_learning_classifier_unsupervised as tlcu

# reload the module to pick up code changes
importlib.reload(tlcu)

# import the class after reload
from src.classes.transfer_learning_classifier_unsupervised import TransferLearningClassifierUnsupervised


# --- 3) Unsupervised pipeline (VGG16 whole CNN) ---
image_column = 'image_path'
category_column = 'product_category'

vgg_extractor = TransferLearningClassifierUnsupervised(
    input_shape=(224, 224, 3),
    backbones=['VGG16'],
    use_include_top=False
)

_ = vgg_extractor.prepare_data_from_dataframe(
    df=df_sampled,
    image_column=image_column,
    category_column=category_column,
    image_dir=None  # image_column already has full paths
)
processed_images = vgg_extractor._load_images()

# features
vgg_features = vgg_extractor._extract_features('VGG16')

# elbow
optimal_components, elbow_fig = vgg_extractor.find_optimal_pca_components(
    vgg_features, max_components=500, step_size=75
)
elbow_fig.show()

# PCA
vgg_features_pca, pca_info, scaler_vgg = vgg_extractor.apply_dimensionality_reduction(
    vgg_features, n_components=optimal_components, method='pca'
)

# t-SNE
vgg_features_tsne, tsne_info, _ = vgg_extractor.apply_dimensionality_reduction(
    vgg_features_pca, n_components=2, method='tsne'
)

# clustering
vgg_clustering_results = vgg_extractor.perform_clustering(
    vgg_features_pca, n_clusters=None, cluster_range=(7, 7)
)

# dashboard
vgg_dashboard = vgg_extractor.create_analysis_dashboard(
    backbone_name='VGG16',
    original_features=vgg_features,
    reduced_features=vgg_features_pca,
    clustering_results=vgg_clustering_results,
    processing_times=vgg_extractor.processing_times,
    pca_info=pca_info
)
vgg_dashboard.show()

# compare with categories
vgg_analysis_results = vgg_extractor.compare_with_categories(
    df=vgg_extractor.df,
    tsne_features=vgg_features_tsne,
    clustering_results=vgg_clustering_results,
    backbone_name='VGG16'
)

# ARI
vgg_ari = vgg_analysis_results['ari_score']
if 'ari_scores' not in globals():
    ari_scores = {}
ari_scores['VGG16'] = vgg_ari
print(f"VGG16 ARI: {vgg_ari:.4f}")
Prepared 1050 samples for unsupervised analysis.
Loaded 1050 images for feature extraction.
Extracting VGG16 features:   0%|          | 0/132 [00:00<?, ?batch/s]
VGG16 features shape: (1050, 512) (include_top=False)
🔍 Finding optimal number of PCA components...
Testing PCA components:   0%|          | 0/7 [00:00<?, ?components/s]
✅ Optimal number of components: 75
Applying PCA to reduce dimensions from 512 to 75...
PCA completed: 68.11% of variance preserved
Applying t-SNE to reduce dimensions to 2...
t-SNE progress:   0%|          | 0/100 [00:00<?, ?%/s]
t-SNE completed
🎯 Performing clustering analysis...
Finding optimal number of clusters in range (7, 7)...
Testing cluster counts:   0%|          | 0/1 [00:00<?, ?k/s]
Optimal number of clusters: 7 (silhouette score: 0.067)
Performing KMeans clustering with 7 clusters...
Clustering completed: 7 clusters, silhouette score: 0.067
🔍 VGG16 Analysis: Comparing clustering with real product categories...
📊 VGG16 processed 1050 images
📂 Unique categories: 7
🎯 Adjusted Rand Index(ARI): 0.3491
🔗 Cluster quality (Silhouette): 0.067

📊 Creating side-by-side comparison: Real Categories vs Clusters...
🔍 VGG16 Side-by-Side Comparison:
VGG16 ARI: 0.3491
In [29]:
# Create a copy to avoid modifying the original dictionary in place
combined_ari_scores = ari_scores.copy()


# Import existing plotting function
from src.scripts.plot_ari_comparison import ari_comparison

# Create and display the final, combined visualization
print("\n📈 Creating final comparison plot...")
final_comparison_fig = ari_comparison(combined_ari_scores)
final_comparison_fig.show()
📈 Creating final comparison plot...

7. Transfer Learning (VGG16)¶

Goal: Classify product images into categories using a pretrained CNN to reduce training time and overfitting.

Model

  • Backbone: VGG16 (ImageNet weights, frozen)
  • Head: GlobalAveragePooling → Dense(1024, ReLU) → Dropout(0.5) → Dense(num_classes, softmax)
  • Variants:
    • base_vgg16 (no augmentation)
    • augmented_vgg16 (with image augmentations)

Data

  • Images resized to 224×224
  • VGG16 preprocessing applied
  • Stratified train / val / test split
  • Optional sampling to ensure minimum samples per class

Augmentations (augmented model)

  • Horizontal flip
  • Small rotations
  • Brightness / zoom tweaks

Training

  • Optimizer: Adam
  • Loss: Categorical crossentropy
  • Batch size: 8
  • Epochs: up to 10 (early stopping patience=3)
  • Only classification head is trainable

Tracked Outputs

  • Train / val loss & accuracy curves
  • Best model selected by validation loss
  • Confusion matrix for best model
In [30]:
from src.classes.transfer_learning_classifier import TransferLearningClassifier


# --- 3. Model Training ---

# Initialize classifier with explicit parameters for reproducibility
classifier = TransferLearningClassifier(
    input_shape=(224, 224, 3)
    
)

# Prepare data - the classifier will now receive full, verified paths
data_summary = classifier.prepare_data_from_dataframe(
    df_sampled, 
    image_column='image_path',      # Use the column with full paths
    category_column='product_category',# Use the clean category column
    test_size=0.2,
    val_size=0.25, 
    random_state=42
)
print("\n✅ Data prepared for transfer learning:")
print(f"   🎯 Classes: {data_summary['num_classes']}")
print(f"   Train/Val/Test split: {data_summary['train_size']}/{data_summary['val_size']}/{data_summary['test_size']}")

# Prepare image arrays for training
classifier.prepare_arrays_method()
print("✅ Image arrays prepared for training.")

# Train models with more conservative parameters for stability
print("\n🚀 Training VGG16 models...")

# Base model
base_model = classifier.create_base_model(show_backbone_summary=True)
results1 = classifier.train_model(
    'base_vgg16', 
    base_model, 
    epochs=10,      # Reduced for faster, more stable initial training
    batch_size=8,   # Smaller batch size to prevent memory issues
    patience=3
)

# Augmented model
aug_model = classifier.create_augmented_model()
results2 = classifier.train_model(
    'augmented_vgg16', 
    aug_model, 
    epochs=10,
    batch_size=8,
    patience=3
)
print("✅ Training complete.")

# --- 4. Results and Visualization ---
print("\n📈 Displaying results...")
# Compare models
comparison_fig = classifier.compare_models()
comparison_fig.show()

# Plot training history
history_fig = classifier.plot_training_history()
history_fig.show()

# Plot confusion matrix for the best model
summary = classifier.get_summary()
if summary['best_model']:
    best_model_name = summary['best_model']['name']
    print(f"📊 Plotting confusion matrix for best model: {best_model_name}")
    conf_fig = classifier.plot_confusion_matrix(best_model_name)
    conf_fig.show()

# Print final summary
print("\n📋 Final Summary:")
print(summary)
🔧 Transfer Learning Classifier initialized
   📊 Input shape: (224, 224, 3)
   🎯 GPU Available: 0
🔄 Preparing data from DataFrame...
   📁 Using default image directory: dataset/Flipkart/Images
   📋 Categories found: ['Baby Care', 'Beauty and Personal Care', 'Computers', 'Home Decor & Festive Needs', 'Home Furnishing', 'Kitchen & Dining', 'Watches']
   🎯 Number of classes: 7
   📊 Train samples: 630
   📊 Validation samples: 210
   📊 Test samples: 210

✅ Data prepared for transfer learning:
   🎯 Classes: 7
   Train/Val/Test split: 630/210/210
🔄 Preparing data using arrays method...
   🖼️ Loading 630 images...
Loading & preprocessing:   0%|          | 0/630 [00:00<?, ?img/s]
   ✅ Successfully loaded 630 images (0 failures)
   🖼️ Loading 210 images...
Loading & preprocessing:   0%|          | 0/210 [00:00<?, ?img/s]
   ✅ Successfully loaded 210 images (0 failures)
   🖼️ Loading 210 images...
Loading & preprocessing:   0%|          | 0/210 [00:00<?, ?img/s]
   ✅ Successfully loaded 210 images (0 failures)
   📊 Train set: (630, 224, 224, 3)
   📊 Validation set: (210, 224, 224, 3)
   📊 Test set: (210, 224, 224, 3)
✅ Image arrays prepared for training.

🚀 Training VGG16 models...
🔧 Creating base model with VGG16...
=== Backbone Summary (Frozen) ===
Model: "vgg16"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                      ┃ Output Shape             ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer_2 (InputLayer)        │ (None, 224, 224, 3)      │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block1_conv1 (Conv2D)             │ (None, 224, 224, 64)     │         1,792 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block1_conv2 (Conv2D)             │ (None, 224, 224, 64)     │        36,928 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block1_pool (MaxPooling2D)        │ (None, 112, 112, 64)     │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block2_conv1 (Conv2D)             │ (None, 112, 112, 128)    │        73,856 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block2_conv2 (Conv2D)             │ (None, 112, 112, 128)    │       147,584 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block2_pool (MaxPooling2D)        │ (None, 56, 56, 128)      │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block3_conv1 (Conv2D)             │ (None, 56, 56, 256)      │       295,168 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block3_conv2 (Conv2D)             │ (None, 56, 56, 256)      │       590,080 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block3_conv3 (Conv2D)             │ (None, 56, 56, 256)      │       590,080 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block3_pool (MaxPooling2D)        │ (None, 28, 28, 256)      │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block4_conv1 (Conv2D)             │ (None, 28, 28, 512)      │     1,180,160 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block4_conv2 (Conv2D)             │ (None, 28, 28, 512)      │     2,359,808 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block4_conv3 (Conv2D)             │ (None, 28, 28, 512)      │     2,359,808 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block4_pool (MaxPooling2D)        │ (None, 14, 14, 512)      │             0 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block5_conv1 (Conv2D)             │ (None, 14, 14, 512)      │     2,359,808 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block5_conv2 (Conv2D)             │ (None, 14, 14, 512)      │     2,359,808 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block5_conv3 (Conv2D)             │ (None, 14, 14, 512)      │     2,359,808 │
├───────────────────────────────────┼──────────────────────────┼───────────────┤
│ block5_pool (MaxPooling2D)        │ (None, 7, 7, 512)        │             0 │
└───────────────────────────────────┴──────────────────────────┴───────────────┘
 Total params: 14,714,688 (56.13 MB)
 Trainable params: 0 (0.00 B)
 Non-trainable params: 14,714,688 (56.13 MB)
   ✅ Base model created and compiled.
Model: "functional_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer_3 (InputLayer)      │ (None, 224, 224, 3)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ vgg16 (Functional)              │ (None, 7, 7, 512)      │    14,714,688 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_1      │ (None, 512)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 1024)           │       525,312 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ (None, 1024)           │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 7)              │         7,175 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 15,247,175 (58.16 MB)
 Trainable params: 532,487 (2.03 MB)
 Non-trainable params: 14,714,688 (56.13 MB)
🔄 Training model: base_vgg16...
Training epochs:   0%|          | 0/10 [00:00<?, ?epoch/s]
Epoch 1/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 1: val_accuracy improved from -inf to 0.76190, saving model to models/base_vgg16_best.keras
79/79 ━━━━━━━━━━━━━━━━━━━━ 35s 337ms/step - accuracy: 0.6079 - loss: 3.6145 - val_accuracy: 0.7619 - val_loss: 3.1286
Epoch 2/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 2: val_accuracy improved from 0.76190 to 0.80000, saving model to models/base_vgg16_best.keras
79/79 ━━━━━━━━━━━━━━━━━━━━ 36s 342ms/step - accuracy: 0.8206 - loss: 1.4161 - val_accuracy: 0.8000 - val_loss: 2.5273
Epoch 3/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 3: val_accuracy improved from 0.80000 to 0.81905, saving model to models/base_vgg16_best.keras
79/79 ━━━━━━━━━━━━━━━━━━━━ 36s 346ms/step - accuracy: 0.8476 - loss: 0.7986 - val_accuracy: 0.8190 - val_loss: 2.1912
Epoch 4/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 4: val_accuracy did not improve from 0.81905
79/79 ━━━━━━━━━━━━━━━━━━━━ 36s 348ms/step - accuracy: 0.9000 - loss: 0.5771 - val_accuracy: 0.7952 - val_loss: 2.1683
Epoch 5/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 5: val_accuracy did not improve from 0.81905
79/79 ━━━━━━━━━━━━━━━━━━━━ 36s 349ms/step - accuracy: 0.9333 - loss: 0.3024 - val_accuracy: 0.8143 - val_loss: 1.8205
Epoch 6/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 6: val_accuracy did not improve from 0.81905
79/79 ━━━━━━━━━━━━━━━━━━━━ 36s 348ms/step - accuracy: 0.9571 - loss: 0.1652 - val_accuracy: 0.8190 - val_loss: 2.0928
Epoch 7/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 7: val_accuracy did not improve from 0.81905
79/79 ━━━━━━━━━━━━━━━━━━━━ 36s 349ms/step - accuracy: 0.9587 - loss: 0.1975 - val_accuracy: 0.7952 - val_loss: 2.0238
Epoch 8/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 8: val_accuracy did not improve from 0.81905
79/79 ━━━━━━━━━━━━━━━━━━━━ 36s 349ms/step - accuracy: 0.9603 - loss: 0.1656 - val_accuracy: 0.8000 - val_loss: 2.3136
Epoch 8: early stopping
✅ Training completed in 292.34s
   📊 Test accuracy: 0.8048
   📊 ARI Score: 0.6026
🔧 Creating augmented model with VGG16 for fine-tuning...
🔧 Creating base model with VGG16...
   ✅ Base model created and compiled.
Model: "functional_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer_5 (InputLayer)      │ (None, 224, 224, 3)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ vgg16 (Functional)              │ (None, 7, 7, 512)      │    14,714,688 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_2      │ (None, 512)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 1024)           │       525,312 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_1 (Dropout)             │ (None, 1024)           │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_3 (Dense)                 │ (None, 7)              │         7,175 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 15,247,175 (58.16 MB)
 Trainable params: 532,487 (2.03 MB)
 Non-trainable params: 14,714,688 (56.13 MB)
   ✅ Model re-compiled for fine-tuning with a lower learning rate.
Model: "functional_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer_5 (InputLayer)      │ (None, 224, 224, 3)    │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ vgg16 (Functional)              │ (None, 7, 7, 512)      │    14,714,688 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ global_average_pooling2d_2      │ (None, 512)            │             0 │
│ (GlobalAveragePooling2D)        │                        │               │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 1024)           │       525,312 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_1 (Dropout)             │ (None, 1024)           │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_3 (Dense)                 │ (None, 7)              │         7,175 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 15,247,175 (58.16 MB)
 Trainable params: 7,611,911 (29.04 MB)
 Non-trainable params: 7,635,264 (29.13 MB)
🔄 Training model: augmented_vgg16...
Training epochs:   0%|          | 0/10 [00:00<?, ?epoch/s]
Epoch 1/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 1: val_accuracy improved from -inf to 0.47143, saving model to models/augmented_vgg16_best.keras
79/79 ━━━━━━━━━━━━━━━━━━━━ 43s 431ms/step - accuracy: 0.2143 - loss: 4.2050 - val_accuracy: 0.4714 - val_loss: 1.4983
Epoch 2/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 2: val_accuracy improved from 0.47143 to 0.60000, saving model to models/augmented_vgg16_best.keras
79/79 ━━━━━━━━━━━━━━━━━━━━ 74s 817ms/step - accuracy: 0.3968 - loss: 1.7426 - val_accuracy: 0.6000 - val_loss: 1.2167
Epoch 3/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 3: val_accuracy improved from 0.60000 to 0.65714, saving model to models/augmented_vgg16_best.keras
79/79 ━━━━━━━━━━━━━━━━━━━━ 43s 435ms/step - accuracy: 0.5349 - loss: 1.2708 - val_accuracy: 0.6571 - val_loss: 1.0425
Epoch 4/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 4: val_accuracy improved from 0.65714 to 0.70000, saving model to models/augmented_vgg16_best.keras
79/79 ━━━━━━━━━━━━━━━━━━━━ 43s 431ms/step - accuracy: 0.6270 - loss: 1.0081 - val_accuracy: 0.7000 - val_loss: 0.9115
Epoch 5/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 5: val_accuracy improved from 0.70000 to 0.72857, saving model to models/augmented_vgg16_best.keras
79/79 ━━━━━━━━━━━━━━━━━━━━ 43s 426ms/step - accuracy: 0.7159 - loss: 0.8201 - val_accuracy: 0.7286 - val_loss: 0.8349
Epoch 6/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 6: val_accuracy did not improve from 0.72857
79/79 ━━━━━━━━━━━━━━━━━━━━ 43s 427ms/step - accuracy: 0.7921 - loss: 0.6276 - val_accuracy: 0.7286 - val_loss: 0.8119
Epoch 7/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 7: val_accuracy improved from 0.72857 to 0.74762, saving model to models/augmented_vgg16_best.keras
79/79 ━━━━━━━━━━━━━━━━━━━━ 45s 455ms/step - accuracy: 0.8032 - loss: 0.5557 - val_accuracy: 0.7476 - val_loss: 0.7650
Epoch 8/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 8: val_accuracy improved from 0.74762 to 0.75714, saving model to models/augmented_vgg16_best.keras
79/79 ━━━━━━━━━━━━━━━━━━━━ 43s 436ms/step - accuracy: 0.8365 - loss: 0.4626 - val_accuracy: 0.7571 - val_loss: 0.7435
Epoch 9/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 9: val_accuracy improved from 0.75714 to 0.77143, saving model to models/augmented_vgg16_best.keras
79/79 ━━━━━━━━━━━━━━━━━━━━ 43s 428ms/step - accuracy: 0.8857 - loss: 0.3473 - val_accuracy: 0.7714 - val_loss: 0.7342
Epoch 10/10:   0%|          | 0/79 [00:00<?, ?batch/s]
Epoch 10: val_accuracy improved from 0.77143 to 0.77619, saving model to models/augmented_vgg16_best.keras
79/79 ━━━━━━━━━━━━━━━━━━━━ 43s 427ms/step - accuracy: 0.8889 - loss: 0.3091 - val_accuracy: 0.7762 - val_loss: 0.7306
WARNING:tensorflow:5 out of the last 140 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x42b7634c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:5 out of the last 140 calls to <function TensorFlowTrainer.make_predict_function.<locals>.one_step_on_data_distributed at 0x42b7634c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
✅ Training completed in 467.57s
   📊 Test accuracy: 0.7952
   📊 ARI Score: 0.5797
✅ Training complete.

📈 Displaying results...
📊 Comparing models...
📊 Plotting training history...
📊 Plotting confusion matrix for best model: base_vgg16
📊 Plotting confusion matrix for base_vgg16...
📋 Final Summary:
{'data': {'num_classes': 7, 'class_names': ['Baby Care', 'Beauty and Personal Care', 'Computers', 'Home Decor & Festive Needs', 'Home Furnishing', 'Kitchen & Dining', 'Watches'], 'train_size': 630, 'val_size': 210, 'test_size': 210}, 'models': {'base_vgg16': {'accuracy': 0.8047618865966797, 'loss': 2.1196353435516357, 'training_time': 292.3367371559143}, 'augmented_vgg16': {'accuracy': 0.7952380776405334, 'loss': 0.8911551833152771, 'training_time': 467.5734751224518}}, 'best_model': {'name': 'base_vgg16', 'test_accuracy': 0.8047618865966797, 'test_loss': 2.1196353435516357, 'val_accuracy': 0.8190476298332214, 'training_time': 292.3367371559143}}
In [31]:
# Call the new method to get the interactive plot
example_fig = classifier.plot_prediction_examples(
    model_name=best_model_name,
    num_correct=4,  # Show 4 correct predictions
    num_incorrect=4 # Show 4 incorrect predictions
)


example_fig.show()
🖼️ Visualizing prediction examples for model: base_vgg16

7. Overall Conclusion & Next Steps¶

Progress So Far

  • Text pipeline: preprocessing + multiple embeddings (TF-IDF, Word2Vec, BERT, USE) with measurable clustering quality.
  • Image pipeline: traditional vs deep vs vision-language features benchmarked.
  • Supervised transfer learning (VGG16) trained (base + augmented) with early stopping, evaluation, confusion matrix, prediction inspection.

Integrated Insight

  • Both modalities independently achieve meaningful structure (text + images).
  • Deep image features narrow the gap to supervised performance.
  • Augmentation reduces overfitting (val acc stabilized vs base).

Gaps

  • No multimodal fusion model yet (late/early fusion or joint encoder).
  • Limited metrics (need F1 / per-class recall).
  • No fine-tuning of backbone or alternative architectures (ResNet/EfficientNet).
  • Limited reproducibility (multi-seed, logging, config management).
  • No interpretability (Grad-CAM, saliency, SHAP).

Next High-Impact Steps

  1. Add per-class metrics + macro/micro F1.
  2. Fine-tune last conv blocks (discriminative unfreezing).
  3. Introduce multimodal fusion (concatenate text + image embeddings).
  4. Run ≥3 seeds; report mean ± std.
  5. Interpretability: Grad-CAM on correct vs misclassified.
  6. Add alternative backbone (e.g., EfficientNetB0) for baseline diversity.
  7. Automate experiment tracking (MLflow / Weights & Biases).

Final Feasibility Statement Both text and image modalities independently support category prediction; transfer learning confirms supervised viability. A fused multimodal classifier is likely to exceed current single-modality performance and is the logical next deliverable.